Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing

نویسندگان

  • David B. Johnson
  • Willy Zwaenepoel
چکیده

In a distributed system using message logging and checkpointing to provide fault tolerance there is always a unique maximum recoverable system state regardless of the message logging protocol used The proof of this relies on the observation that the set of system states that have occurred during any single execution of a system forms a lattice with the sets of consistent and recoverable system states as sublat tices The maximum recoverable system state never decreases and if all messages are eventually logged the domino e ect cannot occur This paper presents a general model for reasoning about recovery in such a system and based on this model an e cient algo rithm for determining the maximum recoverable sys tem state at any time This work uni es existing ap proaches to fault tolerance based on message logging and checkpointing and improves on existing methods for optimistic recovery in distributed systems

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems

Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery....

متن کامل

Efficient Transparent Optimistic Rollback Recovery for Distributed Application Programs

Existing rollback-recovery methods using consistent checkpointing may cause high overhead for applications that frequently send output to the “outside world,” since a new consistent checkpoint must be written before the output can be committed, whereas existing methods using optimistic message logging may cause large delays in committing output, since processes may buffer received messages arbi...

متن کامل

An Asynchronous Recovery Scheme based on Optimistic Message Logging for the Mobile Computing Systems

To provide the fault-tolerance for the mobile computing systems, many checkpointing-based recovery schemes have been proposed. However, considering the nature of the mobile environment in which some mobile hosts (MHs) are often disconnected from the network and the probability of concurrent failures on MHs is high, any kind of coordination during the checkpointing and even during the recovery m...

متن کامل

An Efficient Optimistic Message Logging Scheme for the Recoverable Mobile Computing Systems

This paper presents an efficient scheme to implement the optimistic message logging and the asynchronous recovery for the mobile computing environment. Most of the coordinated checkpointing schemes may not be suitable for the mobile environment, since the unreliable mobile hosts and the fragile network connection may hinder any kind of coordination for checkpointing and recovery. In this paper,...

متن کامل

Distributed System Fault Tolerance Using Message Logging and Checkpointing

Fault tolerance can allow processes executing in a computer system to survive failures within the system This thesis addresses the theory and practice of transparent fault tolerance methods using message logging and checkpointing in distributed systems A general model for reasoning about the behavior and correctness of these methods is developed and the design implementation and performance of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Algorithms

دوره 11  شماره 

صفحات  -

تاریخ انتشار 1990